Goto

Collaborating Authors

 discriminant information


A Semi-Supervised Adaptive Discriminative Discretization Method Improving Discrimination Power of Regularized Naive Bayes

Wang, Shihe, Ren, Jianfeng, Bai, Ruibin

arXiv.org Artificial Intelligence

Recently, many improved naive Bayes methods have been developed with enhanced discrimination capabilities. Among them, regularized naive Bayes (RNB) produces excellent performance by balancing the discrimination power and generalization capability. Data discretization is important in naive Bayes. By grouping similar values into one interval, the data distribution could be better estimated. However, existing methods including RNB often discretize the data into too few intervals, which may result in a significant information loss. To address this problem, we propose a semi-supervised adaptive discriminative discretization framework for naive Bayes, which could better estimate the data distribution by utilizing both labeled data and unlabeled data through pseudo-labeling techniques. The proposed method also significantly reduces the information loss during discretization by utilizing an adaptive discriminative discretization scheme, and hence greatly improves the discrimination power of classifiers. The proposed RNB+, i.e., regularized naive Bayes utilizing the proposed discretization framework, is systematically evaluated on a wide range of machine-learning datasets. It significantly and consistently outperforms state-of-the-art NB classifiers.


A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes

Wang, Shihe, Ren, Jianfeng, Bai, Ruibin, Yao, Yuan, Jiang, Xudong

arXiv.org Artificial Intelligence

In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.


Applications of Naive Bayes part1(Artificial Intelligence)

#artificialintelligence

Abstract: In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme.


Boosting the Discriminant Power of Naive Bayes

Wang, Shihe, Ren, Jianfeng, Lian, Xiaoyu, Bai, Ruibin, Jiang, Xudong

arXiv.org Artificial Intelligence

Naive Bayes has been widely used in many applications because of its simplicity and ability in handling both numerical data and categorical data. However, lack of modeling of correlations between features limits its performance. In addition, noise and outliers in the real-world dataset also greatly degrade the classification performance. In this paper, we propose a feature augmentation method employing a stack auto-encoder to reduce the noise in the data and boost the discriminant power of naive Bayes. The proposed stack auto-encoder consists of two auto-encoders for different purposes. The first encoder shrinks the initial features to derive a compact feature representation in order to remove the noise and redundant information. The second encoder boosts the discriminant power of the features by expanding them into a higher-dimensional space so that different classes of samples could be better separated in the higher-dimensional space. By integrating the proposed feature augmentation method with the regularized naive Bayes, the discrimination power of the model is greatly enhanced. The proposed method is evaluated on a set of machine-learning benchmark datasets. The experimental results show that the proposed method significantly and consistently outperforms the state-of-the-art naive Bayes classifiers.


Global Expanding, Local Shrinking: Discriminant Multi-label Learning with Missing Labels

Ma, Zhongchen, Chen, Songcan

arXiv.org Machine Learning

In multi-label learning, the issue of missing labels brings a major challenge. Many methods attempt to recovery missing labels by exploiting low-rank structure of label matrix. However, these methods just utilize global low-rank label structure, ignore both local low-rank label structures and label discriminant information to some extent, leaving room for further performance improvement. In this paper, we develop a simple yet effective discriminant multi-label learning (DM2L) method for multi-label learning with missing labels. Specifically, we impose the low-rank structures on all the predictions of instances from the same labels (local shrinking of rank), and a maximally separated structure (high-rank structure) on the predictions of instances from different labels (global expanding of rank). In this way, these imposed low-rank structures can help modeling both local and global low-rank label structures, while the imposed high-rank structure can help providing more underlying discriminability. Our subsequent theoretical analysis also supports these intuitions. In addition, we provide a nonlinear extension via using kernel trick to enhance DM2L and establish a concave-convex objective to learn these models. Compared to the other methods, our method involves the fewest assumptions and only one hyper-parameter. Even so, extensive experiments show that our method still outperforms the state-of-the-art methods.


Scalable Kernel Learning via the Discriminant Information

Al, Mert, Hou, Zejiang, Kung, Sun-Yuan

arXiv.org Machine Learning

For commonly used kernels such as Gaussian, the gradient computations mainly consist of matrix products and linear system solutions, thus they can be sped up significantly with GPU-accelerated linear system solvers. For instance, our imple - mentation took less than 80 miliseconds to compute DI/KDI gradients on an nVidia P100 GPU with feature dimensionalities up to 2000 and batch sizes up to 4000 using Gaussian kernels on the 3 datasets considered. In common learning methodologies, where a linear predictor is trained in conjunction with a parametric non-line ar mapping, the overall objective is to minimize a loss functio n averaged over the entire training sample, i.e., to minimize the expected loss over a single empirical distribution. Since D I directly measures the loss of the best linear predictor on a batch, however, stochastic gradient methods have a differe nt interpretation when utilizing this objective. Since each m ini-batch represents a different empirical distribution, DI ba sed training instead aims to find a feature mapping that adapts to various empirical distributions, which can reduce overfitt ing analogous to how bagging can improve generalization [ 27 ].


Multi-view Hybrid Embedding: A Divide-and-Conquer Approach

Xu, Jiamiao, Yu, Shujian, You, Xinge, Leng, Mengjun, Jing, Xiao-Yuan, Chen, C. L. Philip

arXiv.org Machine Learning

We present a novel cross-view classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multi-view subspace learning (MvSL) that aims to learn a latent subspace shared by multi-view data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multi-view data is sampled from nonlinear manifolds or suffers from heavy outliers. To circumvent this drawback, motivated by the Divide-and-Conquer strategy, we propose Multi-view Hybrid Embedding (MvHE), a unique method of dividing the problem of cross-view classification into three subproblems and building one model for each subproblem. Specifically, the first model is designed to remove view discrepancy, whereas the second and third models attempt to discover the intrinsic nonlinear structure and to increase discriminability in intra-view and inter-view samples respectively. The kernel extension is conducted to further boost the representation power of MvHE. Extensive experiments are conducted on four benchmark datasets. Our methods demonstrate overwhelming advantages against the state-of-the-art MvSL based cross-view classification approaches in terms of classification accuracy and robustness.


Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction

Jing, Xiao-Yuan (Wuhan University) | Liu, Qian (Wuhan University and Nanjing University of Posts and Telecommunications) | Wu, Fei (Wuhan University) | Xu, Baowen (Wuhan University) | Zhu, Yangping (Wuhan University) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics)

AAAI Conferences

Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such a text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semi-supervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.


Multilinear Maximum Distance Embedding Via L1-Norm Optimization

Liu, Yang (The Hong Kong Polytechnic University) | Liu, Yan (The Hong Kong Polytechnic University) | Chan, Keith C. C. (The Hong Kong Polytechnic University)

AAAI Conferences

Dimensionality reduction plays an important role in many machine learning and pattern recognition tasks. In this paper, we present a novel dimensionality reduction algorithm called multilinear maximum distance embedding (M2DE), which includes three key components. To preserve the local geometry and discriminant information in the embedded space, M2DE utilizes a new objective function, which aims to maximize the distances between some particular pairs of data points, such as the distances between nearby points and the distances between data points from different classes. To make the mapping of new data points straightforward, and more importantly, to keep the natural tensor structure of high-order data, M2DE integrates multilinear techniques to learn the transformation matrices sequentially. To provide reasonable and stable embedding results, M2DE employs the L1-norm, which is more robust to outliers, to measure the dissimilarity between data points. Experiments on various datasets demonstrate that M2DE achieves good embedding results of high-order data for classification tasks.